
Holger Groschl 


29w[nternational System Safety 
jF Conference 

f Las Vegas, NV 

August 9, 2011 

C. Herbert Shivers, PHD, PE, CSP 





fcpji r ases related to the idea 

e "Latent conditions/' James Reason 

e "Drift to Failure/' Sydney Dekker 

e "Normalization of deviance," Diane Vaughan 

e Reason also said, "If eternal vigilance is the price of 
liberty, then chronic unease is the price of safety." 



Drift to Failure 


e Dekker - . . . "drift to failure" is the greatest risk to 
today's safe socio-technical systems. 

e "Drifting to failure" is a metaphor for the slow, 

incremental movement of systems operations toward 
(and eventually across) the boundaries of their safety 
envelope . 

e People within the system do not recognize the drift 
because of decisions made with incomplete knowledge 
in the face of competition, scarcity, etc. 






e How do we know that we are drifting toward failure? 
What data or metrics exist that we can rely on to make 
sure that we avoid making the decisions or doing the 
things that take us along that path to failure? 

e Do we rely on doing the best we can and serendipity to 
get us there? 

e Serendipity might help us, we certainly should not 
base a strategy on it. 

e Is there a model for measuring this drift? Do we need 
such a model? 
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In the spirit of Reason's Swiss Cheese model, what happens when 
we make decisions regarding our design and engineering rigor ? 

Do we make the holes in the cheese larger, shift the position or 
alignment of the holes, or create more holes, or make the holes 
smaller or fewer? 


We need to focus on decreasing the total amount of permeability 
in the barrier rather than the alignment. 

We shall focus on the "dark side" of Swiss Cheese hole alignment 
for purposes of this discussion, that is, things that go wrong. 
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The USS Thresher 
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The Valero Refinery Fire 







Desiqn 

Considerations 


Consider environmental effects on a process, especially when environmental changes are 
intermittent or cyclic rather than constant. 

Calculate actual energy releases possible within a system to be realistic when estimating 
safety margins. 

Include 'damage control' capability into the system where mass and other considerations 
allow. 

Complex systems can defeat attempts to ensure comprehensive human understanding of 
designs. 

Eliminate common cause failures through proper design for failure tolerance and appropriate 
analysis of accident scenarios. 

0 Provide sufficient resources (funding, education, expertise) for a proper design review. 

0 Develop controls that detect and correct latent conditions in unused equipment or facilities. 

0 Isolate sources where high energy release potential exists, to contain component or assembly 

failures from initiating a chain reaction leading to system failure. 

0 Design intent can be inadequately communicated or misinterpreted as the design progresses 
through its life cycle. 

0 Products in the concept phase of the project life cycle should account for the effects of age 
and include a means to later analyze the system's integrity. 



Test Considerations 


Design engineers must test critical components versus worst case 
off-nominal events to uncover single-point failures. 

Aggressively test critical hardware/ software systems in nominal 
and off-nominal operational regimes to flush out latent design 
defects and test assumptions. 


Analysis 
Considerations 


Prove a system is safe. Actual system performance is indifferent to 
human assumptions. 

Use a systematic approach and technical expertise appropriate to 
the task. 

Rigorously apply analyses and properly interpret the results. 

Conduct and verify hazard analyses to determine where and how 
hazards might arise. 

s Encourage and reward hazard identification beyond any checklist 
used for inspection. 

0 Hazard analysts may be more challenged to deduce or discover 
failure modes overlooked during design than by quantifying risk 
inherent to known scenarios. 


nfiguration Management 
Considerations 


Exercise quality control in the design process and over the design 
products. 

Assess and evaluate adverse impact to systems when replacing 
components or removing portions of the system from design. 
Ensure the changes do not compromise safety, system efficiency, 
and system life cycle. 

Assess all the impacts to the original design when modifying, 
especially when use has changed and the design is well into its 
expected life. 

s Ensure effective communication and rigorous configuration 
management, even with operationally mature programs and 
projects. 


fsk Management 
Considerations 


Consider both the likelihood and consequence of risk - even a 
very unlikely event could jeopardize mission success and crew 
safety. 

Plan for contingencies, understand systems well enough that 
teams can react to and handle unplanned contingencies as well. 

Review decisions to 'mothball' a system and, if sections must 
remain, render them inert (incapable of energy release). 

s Continue questioning initial assumptions about operations, 
equipment, and facilities. 

0 Sustaining rigorous maintenance and quality checks underscores 
recognition that failure modes cannot always be identified at the 
time of a product's inception. 

0 Maintain the level of rigor required to effectively understand and 
manage program risks. 


^ Project Management 

Considerations 


Schedule is an important element of any program, but when 
it becomes the big driver, leaders must ensure they 
understand the risks to performance and safety, and 
mitigate appropriately. 

e We must not let schedule define our test program, but 

rather, let it be defined by risk and technical performance . . . 
allow for the chance that we may need another test before 
we "go operational." 

e All project team members must fully understand and 
implement program processes and procedures. 





Conclusion 


Common and unique issues contribute to system failures. This 
paper has merely touched on the concept of drift to failure as a 
cautionary message. What, then, is the point? 

The point is James Reason's chronic unease as the price of safety. 

So what should be the focus of our chronic unease? 

Managers and leaders, design team members, fabricators and 
assemblers, analysis and assurance personnel, and others 
associated with operating and maintaining systems, need to pay 
attention to identify the manifestation of individual and collective 
behaviors that might indicate slips in rigor or focus or decisions 
that might eat away at safety margins as our system drifts to 
failure. 

s Corrections to drift made during design and development phases 
may efficiently prevent or mitigate drift problems occurring in the 
operational phase. 
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